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Abstract. Sequential change diagnosis is the joint problem of detection and identification 
of a sudden and unobservable change in the distribution of a random sequence. In this 
problem, the common probabiHty law of a sequence of i.i.d. random variables suddenly 

C ■ changes at some disorder time to one of finitely many alternatives. This disorder time marks 

the start of a new regime, whose fingerprint is the new law of observations. Both the disorder 

CN| time and the identity of the new regime are unknown and unobservable. The objective is 

to detect the regime-change as soon as possible, and, at the same time, to determine its 
identity as accurately as possible. Prompt and correct diagnosis is crucial for quick execution 
of the most appropriate measures in response to the new regime, as in fault detection and 
isolation in industrial processes, and target detection and identification in national defense. 
The problem is formulated in a Bayesian framework. An optimal sequential decision strategy 

p<^ is found, and an accurate numerical scheme is described for its implementation. Geometrical 

Qh properties of the optimal strategy are illustrated via numerical examples. The traditional 

problems of Bayesian change-detection and Bayesian sequential multi-hypothesis testing are 
solved as special cases. In addition, a solution is obtained for the problem of detection and 
^ identification of component failure(s) in a system with suspended animation. 

> 

1. Introduction 

^ Sequential change diagnosis is the joint problem of detection and identification of a sudden 

^ change in the distribution of a random sequence. In this problem, one observes a sequence 

^ of i.i.d. random variables Xi,X2, . . ., taking values in some measurable space {E,£). The 

O common probability distribution of the X's is initially some known probability measure Pq 

^ on [E,S), and, in the terminology of statistical process control, the system is said to be "in 

^ control." Then, at some unknown and unobservable disorder time 9, the common proba- 

bility distribution changes suddenly to another probability measiu^c P^^ for some unknown 
and unobservable index fi E Ai = {!,. . . ,M}, and the system goes "out of control." The 
objective is to detect the change as quickly as possible, and, at the same time, to identify 
the new probability distribution as accurately as possible, so that the most suitable actions 
can be taken with the least delay. 

Decision strategies for this problem have a wide array of applications, such as fault de- 
tection and isolation in industrial processes, target detection and identification in national 
defense, pattern recognition and machine learning, radar and sonar signal processing, seis- 
mology, speech and image processing, biomedical signal processing, finance, and insurance. 
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For example, suppose we perform a quality test on each item produced from a manufacturing 
process consisting of several complex processing components (labeled 1, 2, ... , M). As long 
as each processing component is operating properly, we can expect the distribution of our 
quality test statistic to be stationary. Now, if there occurs a sudden fault in one of the pro- 
cessing components, this can change the distribution of our quality test statistic depending 
on the processing component which caused the fault. It may be costly to continue manufac- 
ture of the items at a substandard quality level, so we must decide when to (temporarily) 
shut down the manufacturing process and repair the fault. However, it may also be expensive 
to dissect each and every processing component in order to identify the source of the failure 
and to fix it. So, not only do we want to detect quickly when a fault happens, but, at the 
same time we want also to identify accurately which processing component is the cause. The 
time and the cause of the fault will be distributed independently according to a geometric 
and a finite distribution, respectively, if each component fails independently according to 
some geometric distributions, which is a reasonable assumption for highly reliable compo- 
nents; see Section 5.5 As another example, an insurance company may monitor reported 
claims not only to detect a change in its risk exposure, but also to assess the nature of the 
change so that it can adjust its premium schedule or re-balance appropriately its portfolio 
of reserves to hedge against a different distribution of loss scenarios. 

Sequential change diagnosis can be viewed as the fusion of two fundamental areas of 
sequential analysis: change detection and multi-hypothesis testing. In traditional change 
detection problems, M = 1 and there is only one change distribution. Pi; therefore, the focus 
is exclusively on detecting the change time, whereas in traditional sequential multi-hypothesis 
testing problems, there is no change time to consider. Instead, every observation has common 
distribution for some unknown /i, and the focus is exclusively on the inference of /i. Both 
change detection and sequential multi-hypothesis testing have been studied extensively. For 
recent reviews of these areas, we refer the reader to Basseville and Nikiforov [5], Dragalin, 
Tartakovsky and Veeravalli |H1 E], and Lai [|14j, and the references therein. 

However, the sequential change diagnosis problem involves key trade-off decisions not taken 
into account by separately applying techniques for change detection and sequential multi- 
hypothesis testing. While raising an alarm as soon as the change occurs is advantageous 
for the change detection task, it is undesirable for the isolation task because the longer 
one waits to raise the alarm, the more observations one has to use for inferring the change 
distribution. Moreover, the unknown change time complicates the isolation task, and, as a 
result, adaptation of existing sequential multi-hypothesis testing algorithms is problematic. 

The theory of sequential change diagnosis has not been broadly developed. Nikiforov |16j 
provides the first results for this problem, showing asymptotic optimality for a certain 
non-Bayesian approach, and Lai [13] generalizes these results through the development of 
information-theoretic bounds and the application of likelihood methods. In this paper, we 
follow a Bayesian approach to reveal a new sequential decision strategy for this problem, 
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which incorporates a priori knowledge regarding the distributions of the change time 6 and 
of the change index /x. We prove that this strategy is optimal and we describe an accurate 
numerical scheme for its implementation. 

In Section [2] we formulate precisely the problem in a Bayesian framework, and in Section |3] 
we show that it can be reduced to an optimal stopping of a Markov process whose state space 
is the standard probability simplex. In addition, we establish a simple recursive formula that 
captures the dynamics of the process and yields a sufficient statistic fit for online tracking. 

In Section |4] we use optimal stopping theory to substantiate the optimality equation for the 
value function of the optimal stopping problem. Moreover, we prove that this value function 
is bounded, concave, and continuous on the standard probability simplex. Furthermore, we 
prove that the optimal decision strategy uses a finite number of observations on average 
and we establish some important characteristics of the associated optimal stopping/decision 
region. In particular, we show that the optimal stopping region of the state space for the 
problem consists of M non-empty, convex, closed, and bounded subsets. Also, we consider 
a truncated version of the problem that allows at most observations from the sequence 
of random measurements. We establish an explicit bound (inversely proportional to A^) for 
the approximation error associated with this truncated problem. 

In Section [5] we show that the separate problems of change detection and sequential multi- 
hypothesis testing are solved as special cases of the overall joint solution. We illustrate 
some geometrical properties of the optimal method and demonstrate its implementation by 
numerical examples for the special cases M = 2 and M = 3. Specifically, we show instances 
in which the M convex subsets comprising the optimal stopping region are connected and 
instances in which they are not. Likewise, we show that the continuation region (i.e., the 
complement of the stopping region) need not be connected. We provide a solution to the 
problem of detection and identification of component failure (s) in a system with suspended 
animation. Finally, we outline in Section [6] how the change-diagnosis algorithm may be 
implemented with a computer in general. Proofs of most results are deferred to the Appendix. 

2. Problem statement 

Let (f2,jF, P) be a probability space hosting random variables 6 : Q \—>- {0,1,...} and 
fj. : n Ai = {1, .. . ,M} and a process X = (X„)„,>i taking values in some measurable 
space {E,S). Suppose that for every t> 1, i E Ai, n> 1, and {Ek)^^i C S 

(2.1) F{e = t,^i = t,X^eE^,...,X^e E^} 

=(i-po)(i-p)*-v^ n n ^*(^^) 

l<k<{t-l)An t\/l<e<n 

for some given probability measures Po,Pi, • • • on {E,S), known constants po G [0, 1], 
p G (0, 1), and Ui > 0,i E ^A such that Ui + ■ ■ ■ + um = 1, where x A y = mm{x,y} 
and X y y ^ max{x,y}. Namely, 9 is independent of fi; it has a zero-modified geometric 
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distribution with parameters po and p in the terminology of Klugman, Panjer, and Willmot 
[T2I Sec. 3.6], which reduces to the standard geometric distribution with success probabihty 
p when pq = 0. Moreover, Ui is the probabihty that the change type /i is z for every 
i = l,...,M. 

Conditionally on 6 and n, the random variables X„, n> 1 are independent; Xi, . . . ,Xg^i 
and Xg,Xg+i, . . . are identically distributed with common distributions Pq and P^, respec- 
tively. The probability measures Pq, Pi, . . . , Pm always admit densities with respect to some 
cr-finite measure m on {E, S); for example, we can take m = Pq + Pi ■ ■ ■ + Pa/. So, we fix m 
and denote the corresponding densities by /o, /i, . . . , /m, respectively. 

Suppose now that we observe sequentially the random variables X„, n > 1. Their common 
probability density function /q changes at stage 6 to some other probability density function 
f^, fi E A4. Our objective is to detect the change time 9 as quickly as possible and isolate 
the change index // as accurately as possible. More precisely, given costs associated with 
detection delay, false alarm, and false isolation of the change index, we seek a strategy that 
minimizes the expected total change detection and isolation cost. 

In view of the fact that the observations arrive sequentially, we are interested in sequen- 
tial diagnosis schemes. Specifically, let F = {J^n)n>o denote the natural filtration of the 
observation process X, where 

J^Q = {0,Q} and ^„ = a(Xi, . . . , n > 1. 

A sequential decision strategy 6 = {T,d) is a pair consisting of a stopping time (or stopping 
rule) T of the filtration F and a terminal decision rule d : Vt ^ M. measurable with respect 
to the history Tr = cr(X„/^^; > 1) of observation process X through stage r. Applying a 
sequential decision strategy 5 = (r, d) consists of announcing at the end of stage r that the 
common probability density function has changed from /o to fd at or before stage r. Let 

is an Al-valued random variable} 

denote the collection of all such sequential decision strategies ("r G F" means that r is a 
stopping time of filtration F). Let us specify the possible losses associated with a sequential 
decision strategy 5 = (r, rf) G A as follows: 

(i) Detection delay loss. Let us denote by a fixed positive constant c the detection delay 
cost per period. Then the expected decision delay cost for S is E[c(r — 6')+], possibly 
infinite, where {x)~^ = max{x,0}. 

(ii) Terminal decision loss. Here we identify two cases of isolation loss depending on 
whether or not the change has actually occurred at or before the stage in which we 
announce the isolation decision: 

(a) Loss due to false alarm. Let us denote by aoj the isolation cost on {t < 6 , d = j} 
for every j G Ai. Then the expected false alarm cost for 6 is E[aodl{T<6»}]- 
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(b) Loss due to false isolation. Let us denote by aij the isolation cost on the event 
{6<T<Qo,d = i,jj, = i} for every i,jEM.. Then the expected false isolation 
cost for 6 is E[af,d'^{e<T<oo}]- 

Here, aij,i,j G Ai are known nonnegative constants, and an = for every ? G A^; 

i.e., no cost incurred for making a correct terminal decision. 

Accordingly, for every sequential decision strategy 6 = (r, d) G A, we define a Bayes risk 
function 



(2.2) 



R{S) = cE[(r - 6*)+] + E[aodl{T<e} + a^dl{e<T<oo}] 



as the expected diagnosis cost: the sum of the expected detection delay cost and the expected 
terminal decision cost upon alarm. The problem is to find a sequential decision strategy 
5 = (r, G A (if it exists) with the minimum Bayes risk 



(2.3) 



R* ^ mfR(S). 

5eA 



3. Posterior analysis and formulation as an optimal stopping problem 



In this section we show that the Bayes risk function in (2.2) can be written as the expected 



value of the running and terminal costs driven by a certain Markov process. We use this fact 



to recast the minimum Bayes risk in (2.3) as a Markov optimal stopping problem. 



Let us introduce the posterior probability processes 



n(,o) ^F{9>n\ Tn} and n« 



< n, /i = z I z G A^, n > 0. 



Having observed the first n observations, Hn is the posterior probability that the change 
has not yet occurred at or before stage n, while Vlu is the posterior joint probability that the 
change has occurred by stage n and that the hypothesis ^ = i is correct. The connection of 
these posterior probabilities to the loss structure for our problem is established in the next 
proposition. 



can be expressed in terms of the process H 
R{5) 



J-J-n 



)}n>0 (IS 



Proposition 3.1. For every sequential decision strategy 6 E A, the Bayes risk function (2.2) 

't-1 M M 

5^c(i-nf) + i|.<^}J]i|,=,}5^ 



E 



n=0 



i=0 



While our original formulation of the Bayes risk function (2.2) was in terms of the values of 



the unobservable random variables 9 and fi, Proposition |3. 1| gives us an equivalent version of 
the Bayes risk function in terms of the posterior distributions for 6 and /i. This is particularly 



effective in light of Proposition 3.2 which we state with the aid of some additional notation 
that is referred to throughout the paper. Let 



S 



M A 



{tt = (ttcTTi, . . . ,7rAf) G [0,1] 



M+1 



1} 
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denote the standard M-dimensional probability simplex. Define the mappings Di : x E 
[0, 1], i e A< and D : S^^ x E ^ [0, 1] by 



(3.i: 



{1 - p)7Tofo{x), i = 

and the operator T on the collection of bounded functions / : 

'Do{n,x) Dm{t^,x)' 



M 



(3.2) 



(T/)( 



7r 



■m(dx) D(n, x) f 



D{7i,x) = A(7r,x), 
■ R by 

for every tt G S^^ . 



D{tt, x) ' ' -D(7r, x) 
Proposition 3.2. The process U possesses the following properties: 

(a) The process n^^^ = {Iln \ Tn}n>o is a supermartingale, and Elli'^'' < (1 — p)"- for 
every n > 0. 

(b) The process n^*-' = \Iin\j^n}n>o is a submartingale for every i G Ai. 

(c) The process U = {(n.n \ ■ ■ ■ , ni^'*)}„>o is a Markov process, and 



(3.3) 



n 



n+1 



n+lj 



, ie{0}UM, n>0, 



with initial state U^^^ = 1 ~pq and IIq^ = po^i, i G M.. Moreover, for every bounded 
function f : S^^ andn>0, we have E[/(n„+i)|n„] = (T/)(n„). 

Remark 3.3. Since 1 = Xlilfo the vector {Iln \ . . . , ni*^-*) G for every n > 0. Since 
n is uniformly bounded, the limit lim„^oo n„ exists by the martingale convergence theorem. 
Moreover, lim„^oo = a.s. by Proposition 3.2(a) since p G (0, 1). 

Now, let the functions h, hi, . . . , Hm from S"*^ into R^ be defined by 

M 



h(n) = minhj^n) and hj 



E 

i=0 



Hiaij, j G M, 



respectively. Then, we note that for every 5 = (r, (i) G A, we have 



R{t, d) =E 



> E 



T-l 



M 



C(l - nf ) + l{.<oo} J2 M<i=j}hA^r) 



,n=0 

T-l 



5^c(i-nW) + i|.<oo}Mn. 



.n=0 



i?(r, d) 



where we define on the event {r < oo} the terminal decision rule d to be any index satisfying 
/ij(nT-) = h{IlT-). In other words, an optimal terminal decision depends only upon the value 
of the n process at the stage in which we stop. Note also that the functions h and hi, ... , Hm 
are bounded on S'^^ . Therefore, we have the following: 
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lemma 3.4. The minimum Bayes risk (2.S) reduces to the following optimal stopping of the 
Markov process U: 

"r-l 



R* = inf i?(r, d) = inf i?(r, d) = inf E 

(-r,rf)eA (r,J)eA ^eF 



5^c(i-nf) + i|.<oo}Mn,) 



n=0 



We simplify this formulation further by showing that it is enough to take the infimum over 
(3.4) C = {t e¥\T < oo a.s. and EI7 < 00}, 



where 
(3.5) 



n-l 



F„^5^c(i-nf ) + /i(n„), n>o 



k=0 



is the minimum cost obtained by making the best terminal decision when alarm is set at time 
n. Since h{-) is bounded on , the process {l^,JF„;ri > 0} consists of integrable random 
variables. So the expectation EYV exists for every r G F, and our problem becomes 



(3.6) 



-R* = supEY^. 

tSF 



Observe that Er < 00 for every t e C because 00 > {l/c)EY^' > E(r - 6*)+ > E(r - 9) > 
Er - E6' > Er - (1/p). In fact, we have EY^ > -00 ^ EY^ < 00 Er < cx) for every 
r G F. Since sup^gpEF,- > EFq > — ^(Hq) > —00, it is enough to consider r G F such that 
Er < 00. Namely, (3.6) reduces to 

(3.7) -i?* = supEK. 

rec 

4. Solution via optimal stopping theory 

In this section we derive an optimal solution for the sequential change diagnosis problem 
in (2.3) by building on the formulation of (3.7) via the tools of optimal stopping theory. 

4.1. The optimality equation. We begin by applying the method of truncation with a 
view of passing to the limit to arrive at the final result. For every > and n = 0, . . . , N, 
define the sub-collections 

Cn = {tWuIt eC} and = {r A | r G C„} 



of stopping times in C of (3.4). Note that C = Cq- Now, consider the families of (truncated) 
optimal stopping problems corresponding to {Cn)n>o and (C^)o<n<7v, respectively, defined 
by 



(4.1) -K 
Note that R* = Vq. 



sup KYr, n >0 and 

T&Cn 



N A 



n 



sup EK, < ri < A^, > 0. 
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To investigate these optimal stopping problems, we introduce versions of the Snell envelope 
of (Yn)n>o (i-e., the smallest regular supermartingale dominating (Yn)n>o) corresponding to 
(Cn)n>o and (C^)o<n<Ar, respectively, defined by 

(4.2) 7„ = ess supE[y, | j;], n > and 7^ = ess supE[y^ | J^n], < n < A^, > 0. 

Then through the following series of lemmas, whose proofs are deferred to the Appendix, we 
point out several useful properties of these Snell envelopes. Finally, we extend these results 
to an arbitrary initial state vector and establish the optimality equation. Note that each of 
the ensuing (in)equalities between random variables are in the P-almost sure sense. 

First, these Snell envelopes provide the following alternative expressions for the optimal 
stopping problems introduced in (4.1) above. 

lemma 4.1. For every N > and < n < N , we have —Vn = ]E7„ and —V^^ = IE7^. 

Second, we have the following backward-induction equations. 

lemma 4.2. We have 7„ = max{l^, E[7„_|_i |J^„,]} for every n > 0. For every N > 1 and 
< n < N — 1, we have = Yjy and 7^ = max{F„, E[7^;^ | 

We also have that these versions of the Snell envelopes coincide in the limit as A^ ^ 00. 
That is, 

lemma 4.3. For every n > 0, we have 7„ = limTv^oo 7,^- 

Next, recall from (3.2) and Proposition 3.2[ c) the operator T and let us introduce the 
operator M on the collection of bounded functions / : 5*^ 1— > IR+ defined by 

(4.3) (M/)(7r)4min{/i(7r),c(l-7ro) + (T/)(7r)}, n E . 

Observe that < M/ < h. That is, vr (M/)(7r) is a nonnegative bounded function. 
Therefore, WPf = M(M/) is well-defined. If / is nonnegative and bounded, then M"/ = 
M(M"~-'^/) is defined for every n > 1, with M.^f = / by definition. Using operator M, we 
can express (7^)o<n<Af in terms of the process 11 as stated in the following lemma. 

lemma 4.4. For every N > 0, and < n < N , we have 

n-l 

(4.4) 7n^ = -c5^(i - ) - (M^-"/.)(nj. 

fe=0 

The next lemma shows how the optimal stopping problems can be rewritten in terms of 
the operator M. It also conveys the connection between the truncated optimal stopping 
problems and the initial state Hq of the 11 process. 

lemma 4.5. We have 

(a) V/ = (M^/i)(no) for every N>0, and 
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(b) Vq = lim (M^/i)(no). 

N—>oo 

Observe that since Hq E J-'o = {0,^2}, we have P{no = vr} = 1 for some vr G S^'^ . On 
the other hand, for every tt G S'^^ we can construct a probabihty space P^) hosting 

a Markov process 11 with the same dynamics as in (3.3) and PttIIIo = vr} = 1. Moreover, 
on such a probabihty space, the preceding results remain vahd. So, let us denote by K^, the 
expectation with respect to Ptt and rewrite (4.1) as 

-Vniir) = sup E^Yr, n > 0, and - V^{7c) = sup E^K, 0<n<iV, A^>0 



for every vr G S . Then Lemma |4.5| implies that 

(4.5) (vr) = (M^/i)(7r) for every > 0, and V^)(7r) = lim (M^/i)(7r) 

for every vr G S^'^ . Taking limits as iV ^ oo of both sides in (M^+i/i)(7r) = M(M^/i)(7r) and 
applying the monotone convergence theorem on the right-hand side yields Vq{'k) = (MVo)(7r). 
Hence, we have shown the following result. 

Proposition 4.6 (Optimality equation). For every tt G S^'^ , we have 

(4.6) Vo{7r) = {MVo){7i) = mm{h{7c), c(l - ttq) + (Wo)(7r)}. 

Remark 4.7. By solving Vq{7t) for any initial state vr G 5**^, we capture the solution to the 
original problem since property (c) of Proposition 3.2 and ( 3.7[ ) imply that 

R* = Vo{l -Pq,PqUi, . . . ,PoVm)- 

4.2. Some properties of the value function. Now, we reveal some important properties 
of the value function Vo(-) of (4.5). These results help us to establish an optimal solution for 
Vo(-), and hence an optimal solution for R* , in the next subsection. 

lemma 4.8. If g : S^^ is a hounded concave function, then so is Tg. 

Proposition 4.9. The mappings it \—>- Vf^{TT),N > and tt i-^ Vo(7r) are concave. 

Proposition 4.10. For every N > 1 and tt G 5*^, we have 

VoiTT) < K,^(7r) < VoiTT) + + i_. 

Since \\h\\ = sup^g^A/ \h{TT)\ < oo, limjv^oo i ^o^(^) — '^oi^) uniformly in tt G S^^ . 
Proposition 4.11. For every N > 0, the function : S^'^ i— > IR+ is continuous. 
Corollary 4.12. The function Vq : S'^^ ^ IR+ is continuous. 

Note that S^' is a compact subset of ]R*^+-'^, so while continuity of Vo(-) on the interior 



of S'^^ follows from the concavity of Vo(-) by Proposition 4.8, Corollary 4.12 establishes 
continuity on all of S**^, including its boundary. 
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4.3. An optimal sequential decision strategy. Finally, we describe the optimal stopping 
region in S^'^ implied by the value function Vo(-), and we present an optimal sequential 
decision strategy for our problem. Let us define for every > 0, 

r^v = {vr G I \/o^(7r) = h{n)}, T^ff 4 n {vr G S'' \ h{n) = hj{n)}, j G M, 

Y^{neS^'\ Voiir) = h{7r)}, T^'^ 4 r n {tt G | h{n) = /i,(7r)}, j G M. 

Theorem |4.15 below shows that it is always optimal to stop and raise an alarm as soon 



as the posterior probability process 11 enters the region F. Intuitively, this follows from 



the optimality equation (4.6). At any stage, we always have two choices: either we stop 
immediately and raise an alarm or we wait for at least one more stage and take an additional 
observation. If the posterior probability of all possibilities is given by the vector vr, then the 
costs of those competing actions equal h{7!-) and c(l — ttq) + (TVo)(7r), respectively, and it is 
always better to take the action that has the smaller expected cost. The cost of stopping 
is less (and therefore stopping is optimal) if h{7i) < c(l — ttq) + (TVo)(7r), equivalently, if 
Vo(7r) = h{7r). Likewise, if at most N stages are left, then stopping is optimal if VQ'^(7r) = /^(vr) 
or TT G Fat. 

For each j G {0} U A^, let ej G denote the unit vector consisting of zero in every 
component except for the jth component, which is equal to one. Note that Cq, . . . , cm are 
the extreme points of the closed convex set S'^ , and any vector n = (ttq, . . . , hm) G S'^'-' can 
be expressed in terms of eo, cm as vr = J2jLo "^j^j- 

theorem 4.13. For every j G A4, (Fj;j^)jv>o a decreasing sequence of non-empty, closed, 
convex subsets of S'^^ . Moreover, 

4^'^ D f[^'^ D ■ ■ ■ ^ F(J) ^ {tt G S^^ I hj{n) < min{/i(7r), c(l - ttq)}} 3 Cj, 

oo M oo 

r = fl = U r^'\ and f(^) = fl ^S^^' ^- 

N=l j=l N=l 

Furthermore, S^^ = Fq ^ Fi 3 ■ ■ ■ 3 F ^ {ei, . . . , cm}- 

lemma 4.14. For every n >0, we have 7„ = — c^5!~o(l — ^f^) — Vo(n„). 

theorem 4.15. Let a = M{n > | n„ G F}. 

(a) The stopped process {'ynAa,^n',n > 0} is a martingale. 

(b) The random variable a is an optimal stopping time for Vq, and 

(c) E cr < oo. 



Therefore, the pair {(J,d*) is an optimal sequential decision strategy for (2.3), where the 



optimal stopping rule a is given by Theorem 4.15 and, as in the proof of Lemma 3^, the 
optimal terminal decision rule d* is given by 

d* = j on the event {a = n, n„ G F^-'-'} for every n > 0. 
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Accordingly, the set T is called the stopping region implied by Vo(-), and Theorem 4.13 reveals 
its basic structure. We demonstrate the use of these results in the numerical examples of 
Section |5l 

Note that we can take a similar approach to prove that the stopping rules = inf{n > 
I n„ G FTv-n}, > are optimal for the truncated problems Vo'^(-), > in (4.5). Thus, 



for each A^ > 0, the set Fat is called the stopping region for Vq'^(-): it is optimal to terminate 
the experiments in F^v if A^ stages are left before truncation. 

5. Special cases and examples 

In this section we discuss solutions for various special cases of the general formulation 
given in Section [2j First, we show how the traditional problems of Bayesian sequential 
change detection and Bayesian sequential multi-hypothesis testing are formulated via the 
framework of Section |2] Then we present numerical examples for the cases M = 2 and 
M = 3. In particular, we develop a geometrical framework for working with the sufficient 
statistic developed in Section |3] and the optimal sequential decision strategy developed in 
Section |4j Finally, we solve the special problem of detection and identification of primary 
component failure (s) in a system with suspended animation. 

5.1. A. N. Shiryaev's sequential change detection problem. Set a^j = 1 for j E Ai 

and Oij = for i, j G then the Bayes risk function (2.2) becomes 



R{6) = cE[(r - 9)+] + E[aodl{r<e} + a^dl{e<r<oo}] = cE[(r - 9)+] + E[l|,<e}] 
= F{t < 9} + cE[{t - 9)+]. 

This is the Bayes risk studied by Shiryaev [191 ISO] to solve the sequential change detection 
problem. 

5.2. Sequential multi-hypothesis testing. Set po = 1, then 9 = a.s. and thus the Bayes 



risk function (2.2) becomes 

R{S) = cE[(r - 6*)+] + E[aodl{r<e} + a^dl{e<r<oo}] = E[cr + a^dl{r<oo}]- 

This gives the sequential multi-hypothesis testing problem studied by Wald and Wolfowitz 
[22], Arrow, Blackwell, and Girshick [T]; see also Blackwell and Girshick [Sj. 

5.3. Two alternatives after the change. In this subsection we consider the special case 
M = 2 in which we have only two possible change distributions, /i(-) and /2(-)- We describe a 
graphical representation of the stopping and continuation regions for an arbitrary instance of 
the special case M = 2. Then we use this representation to illustrate geometrical properties 



of the optimal method (Section 4.3) via model instances for certain choices of the model 
parameters po, p, ui, U2, /o(-), fi{-), f2{-), aoi, ao2, ai2, a2i, and c. 

Let the linear mapping L : M'^ i-^ be defined by L(7ro, vti, 112) = {^^^i + 75^2, 7^2)- Since 
TTo = 1 — VTi — 772 for every n = (ttq, tti, 772) G S"^ C M^, we can recover the preimage tt of any 
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Figure 1. Linear mapping L of the standard two-dimensional probability 
simplex S*^ from the positive orthant of into the positive quadrant of M?. 

point L^Ti) G L{S'^) C M^. For every point tt = (710,711,712) G S'^, the coordinate 7rj is given 
by the Euclidean distance from the image point L{ti) to the edge of the image triangle L{S'^) 
that is opposite the image point L{ei), for each i = 0, 1,2. For example, the distance from 
the image point L{7r) to the edge of the image triangle opposite the lower-left-hand corner 
0, 0) = (0, 0) is the value of the preimage coordinate tiq. See Figure [l] 

Therefore, we can work with the mappings L{r) and L(S'^ \ F) of the stopping region F 
and the continuation region S"^ \ F, respectively. Accordingly, we depict the decision region 
for each instance in this subsection using the two-dimensional representation as in the right- 
hand-side of Figure [l] and we drop the L(-) notation when labeling various parts of each 
figure to emphasize their source in S"^. 

Each of the examples in this section have the following model parameters in common: 

iO U' 4' 4' 4/ ' VIO' 10' 10' 10/ ' •'2 Uo' 10' 10' 10/ ■ 

We vary the delay cost and false alarm/isolation costs to illustrate certain geometrical prop- 
erties of the continuation and stopping regions. See Figures |2| [3| and |4j 

Specifically, these examples show instances in which the M = 2 convex subsets comprising 
the optimal stopping region are connected (Figure |2]) and instances in which they are not 
(Figures [3] and |4]^a)). Figure |4]^b) shows an instance in which the continuation region is 
disconnected. 

Each of the figures in this section have certain features in common. On each subfigure 
there is a dashed line representing those states 7r G 5*^ at which /ii(7r) = /i2(7r). Also, each 
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(0,0,1) (0,0,1) 




(1,0,0) (0,1,0) (1,0,0) (0,1,0) 

(a) (b) 

Figure 2. Illustration of connected stopping regions and the effects of vari- 
ation in the false-alarm costs, (a) and (b): ai2 = 021 = 3, c = 1. (a): 
«oi = O02 = 10. (b): aoi = ao2 = 50. 




Figure 3. Illustration of disconnected stopping regions and the effects of 
asymmetric false-isolation costs, (a) and (b): oqi = ao2 = 10, c = 1. (a): 
«i2 = ^21 = 10. (b): ai2 = 16, = 4. 



subfigure shows a sample path of (n„)^^Q and the realizations of 6 and fi for the sample. 
The shaded area, including its solid boundary, represents the optimal stopping region, while 
the unshaded area represents the continuation region. 

An implementation of the optimal strategy as described in Section [473] is as follows: Initial- 
ize the statistic 11 



n 



n n>Q 



by setting IIq = (1 — Po,Poi^i,Poi^2) as in part (c) of Proposition 



3.2 Use the dynamics of (3.3) to update the statistic n„ as each observation X„ is realized. 
Stop taking observations when the statistic Tin enters the stopping region T = T^^^ U F^^^ for 
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(0,0,1) (0,0,1) 




(1,0,0) (0,1,0) (1,0,0) (0,1,0) 

(a) (b) 

Figure 4. Illustration of a disconnected continuation region and the effects 
of variation in the delay cost, (a) and (b): aoi = 14,ao2 = 20,ai2 = 021 = 8. 
(a): c= 1. (b): c = 2. 



the first time, possibly before the first observation is taken (i.e., n = 0). The optimal ter- 
minal decision is based upon whether the statistic Tin is in F^^^ or F'-^-' upon stopping. Each 
of the sample paths in Figures |2| [3} and |4] were generated via this algorithm. As Figure |2] 
shows, the sets F^^^ and F*^^-* can intersect on their boundaries and so it is possible to stop 
in their intersection. In this case, either of the decisions (i = 1 or = 2 is optimal. 

We use value iteration of the optimality equation (4.6) over a fine discretization of to 
compute Vo(-) and generate the decision region for each subfigure. Because in the expression 
Vo{tt) = mm{h{n), c(l— 7ro) + (TVo)(7r)} the value Vo(7r) for any fixed initial condition Hq = vr 
on the left depends on the entire function Vo(-) on 5*^^ on the right, we have to calculate 
Vq{-) (or approximate it by {■)) on the entire space S"*^. The resulting discretized decision 
region is mapped into the plane via L. 

See Bertsekas [H Chapter 3] for techniques of computing the value function via the op- 
timality equation such as value iteration. Solving the optimality equation by discretizing 
high-dimensional state-space may not be the best option. Monte Carlo methods based on 
regression models for the value function seem to scale better as the dimension of the state- 
space increases; see, for example, Longstaff and Schwartz [15j, Tsitsiklis and van Roy [2T], 
Glasserman [TCF, Chapter 8] for details. 



5.4. Three alternatives after the change. In this subsection we consider the special case 
M = 3 in which we have three possible change distributions, /i(-)? f2{, )i and /3(-)- Here, the 
continuation and stopping regions are subsets of 5*^ C M^. Similar to the two-alternatives 
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(0,0,0,1) 




(0,0J,0) -.^^ 



(0,1,0,0) 



Figure 5. Illustration of the mapped decision region for an instance of the 
special case M = 3; see also Figure [7] below. A sample path of the process 11 
is shown in which 6 = 6 and fi = 3 



case, we introduce the mapping of 5*^ C into via 



[7ro,7ri,7r2,7r3) 



1^1 



1 / 3^ 



1 3 3 /I 

2 V 2 3) 2 V 2 2 



Then we use this representation — actually a rotation of it — to illustrate in Figure [5] an 
instance with the following model parameters: 



Po 



f = (I I I l] f = ( 

JO V4'4'4'4/' J ^ \ 



50 ' ^ 20 ' 

± A A 

10' 10' 10' 10/ ' 



1^1 

f2 = 



^2 = ^3 



1 

3' 



AAA 

10' 10' 10' 10 



)' /3=( 



10' 10' 10' 10 > 



aoj 



40, 



20, 



1,2,3. 



Note that Figure [5] can be interpreted in a manner similar to the figures of the previous 
subsection. In this case, for every point n = (ttq, tti, 7r2, tts) G S^, the coordinate tTj is given 
by the (Euclidean) distance from the image point L{tt) to the face of the image tetrahedron 
L{S^) that is opposite the image corner L{ei), for each i = 0,1, 2, 3. 
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5.5. Detection and identification of component failure(s) in a system with sus- 
pended animation. Consider a system consisting initially of two working concealed com- 
ponents (labeled 1 and 2) such that upon the failure of either component, the system goes 
into a state of suspended animation. That is, while both components are still working nor- 
mally, observations of output of the system have density /o(-)) but upon failure of either 
component the density of observations changes thereafter (until an alarm is raised) to one 
of two alternatives: if component 2 fails before component 1, then post-failure observations 
have density f2{-), otherwise they have density /i(-)- The problem is to detect quickly when 
there has been a component failure and to identify accurately which component has actually 
failed based only on sequential observations of output of the system. 
Let the random variables 



be respectively the time of failure of the first failed component of the system and the cor- 
responding index of this component, where the failure time 9i of the ith component is a 
random variable having a geometric distribution with failure probability Pi, i = 1,2. It can 
be shown easily that when the disorder times 6i and 62 are independent, the random variable 
9 has a geometric distribution with failure probability p := pi + P2 — P1P2 (or equivalently, 
9 has a zero-modified geometric distribution with parameters Po = and p) and that it is 
independent of the random variable fi, which has distribution z/i = pi/p and 1/2 = 1 — Vi. 
So although the failure type (i.e., which component has failed) is a function of the failure 
times of each component, it turns out that this problem fits properly within the Bayesian 
sequential change diagnosis framework. 

This problem can be extended naturally to several components and solved via the tech- 
nology of Sections [3] and |4} In fact, it can be configured for a variety of scenarios. For 
example, series-connected components where malfunction of one component suspends im- 
mediately the operation of all the remaining components can appear in various electronic 
relays and multicomponent electronic devices which have fuses to protect the system from 
the misbehavior of one of its components. Since the system may react differently to diagnos- 
tics run by the operators, post-malfunction behavior can differ according to the underlying 
cause of the malfunction. See Barlow [2] Section 8.4] for background on series systems with 
suspended animation. Consider also a manufacturing process where we perform a quality 
test on the final output produced from several processing components. If a component is 
highly reliable then a geometric distribution with a low failure rate can be a reasonable choice 
for the lifetime of the component. Moreover, since the typical duration between successive 
component failures widens over time we can often treat the remaining components as if they 
enter a state of suspended animation under certain cost structures. That is, we can expect 
the remaining components to outlive the alarm. For example, suppose that two independent 
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geometric random variables have expected lifetime of 1000 each. Then the first failure will 
occur at about time 500 on average, while the second failure will take an additional 1000 
periods on average to occur. As illustrated in Figures [2] and [4], respectively, lower false-alarm 
costs promote raising the alarm earlier, while a higher delay cost discourages waiting for 
more than relatively few additional periods to raise the alarm. 

Specifically, suppose that in a "black box" there are K components whose lifetimes are 
independent and geometrically distributed. Observations have initially distribution /o(-) 
while the system is working, but upon failure of a single component (or simultaneous failure 
of multiple components), the remaining components enter a state of suspended animation, 
and the post-failure distribution of observations is determined by the failed component (s). 
We want to detect the time when at least one of them fails as soon as possible. Moreover, 
when we raise an alarm we would like to be able to make as accurately as possible diagnoses 
such as (1) how many of the components have actually failed, and (2) which ones. 

Again, let the failure time 9^ of the fcth component be a random variable having a geometric 
distribution with failure probability pk, k & )C := {1,2,..., K}, and define 

e ■= 9i A92 A - ■ ■ A9k = min 9k 

k€K 

as the time when at least one of the K components fails. Let the mapping ip : 2^ \—>- 
{0, 1, . . .} be a nonnegative-integer-valued measure on the discrete cr-algebra 2^ of the set 
/C = {1, 2, . . . , K} of component indices, and define the random variable 

l^:=cp{{kelC\9 = 9k}) 

as an index function on the set of indices of the failed components. When the random 
variables 9i,...,9k are independent, it can be shown that the random variable 9 has a 
geometric distribution with failure probability 

and that it is independent of the random variable yU, which has distribution 

''^'■=- E n^'II^l-Pi)' fceA<:={l,2,...,M:=y.(/C)}. 

So, the preceding example of two components corresponds to the special case where K = 2 
and v'(^) = min A for A G {{1}, {1, 2}, {2}}. We can handle the other two aforementioned 
objectives as follows: 

(1) Let ^p{A) = \A\,A & 2^. Then the random variable fi represents how many compo- 
nents fail. 

(2) Let ^p{A) = XlieA^*"^)^ ^ 2^^. Then the mapping ip is one-to-one, the random 
variable takes values in 1, 2, . . . , 2^ — 1, and the set ip~^{fi) consists of the indices 
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of the components which fail; i.e., the random variable /i identifies uniquely which 
components fail. 



6. On the computer implementation of the change-diagnosis algorithm. 
Updating posterior probability process 11 online with a computer by using the recursive 



equations in (3.1) and (3.3) is fast. However, programming a computer to check online 
whether this process has just entered the optimal stopping region is a challenging task. This 
is especially so because (i) the critical boundaries of stopping regions do not have known 
closed-form expressions, and (ii) extensive online computations to determine if one of these 
boundaries is crossed can take excessive time and defeat the purpose of quickest change 
detection. Here we outline an implementation strategy that should perform well in general. 

The strategy is based on sparse offline representations of critical boundaries between stop- 
ping and continuation regions. Suppose that the posterior probability process H has just 
been updated to some vr = (ttq, tti, . . . , ttm) ^ 'S'^-'^. An alarm has to be raised if and only if 
n E T = T^^^ U ■ ■ ■ U r*^*^\ Checking vr G F*^'^ for every i = 1, . . . , M (in the worst case) is, 
however, unnecessary because 

TT G r TT G Fj if hi^Tf) = h{iT) = ^min^ /ij(7r). 

In other words, one should 

(i) find i = argmini<j<M hj{7i) first, and 

(ii) raise an alarm and declare that a change of type i has happened if 71 G rW, or 

(iii) wait for at least one more period before raising any alarm otherwise. 

Let us suppose that i = argmini<j<A/ hjijx). Checking if vr G F^*) will be fast if both vr and 
F*^*^ are represented in terms of polar coordinates, set up locally relative to the corner of S^'^ 
confined in the convex set V^^\ 

To illustrate the ideas with simple pictures, we will focus on the case that there are M = 2 
alternatives after the change; see Figure |6} If tt = 7r° (respectively, vr = vr*) as in Figure 
[6]^a), then hiiji) < h2{n) and i = 1 (respectively, hi(n) > h2{n) and i = 2). In either 
case, TT can be identified relative to each corner in terms of (i) the Euclidean distance to 
that corner (denoted by rj^ii), j = 0,1,2) and (ii) one arbitrary but fixed angle (say, by 
Pj{n), j = 0, 1,2,3 indicated on Figure [6|^a)) between the line connecting vr and the corner 
and the rays forming the same corner. Every point on the critical boundary of the stopping 
region F(^), j = 1,2 admits the same representation. Let us express by r = gj{(3) the critical 
boundary of the stopping region F^-'^ in terms of the polar coordinates (/3, r) measured locally 
with respect to the corner of the simplex confined in F*^-^^ for j = 1,2. Then 7r° G F if and 
only if ri(7r°) < gi{(3i{7r°)) , and vr^ G F if and only if r2(7r^) < ^2(/32(7r^)); see Figures |6]; 
and|6](c), respectively. 
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;i,o,o) 



(0,1,0) 



^ (in radian) 



r2(vr^) 





(in radian) (1,0,0) 



(0,1,0) 



(d) 



Figure 6. For the sample problem displayed in Figure |4][a) (M = 2), optimal 
stopping regions and local polar coordinate systems are shown in (a). The 
critical boundaries of the stopping regions F^^^ and F^^^ are expressed in terms 
of local polar coordinates in (b) and (c), respectively. In (d), polar coordinates 
of vr are stated in terms of its Cartesian coordinates. As in Section 5^, we 
drop L from L(7r°), L{T^^^), L(1,0,0), etc. and simply write 7r°, F^^), (1,0,0). 
In (a) /ii(7r°) < /i2(7r°) and hi^ix") > hiin''). 



The plan outlined above works well (i) if the local polar coordinates of tt can be identified 
online quickly, (ii) if the local representations gj{-), j = 1,2 of the critical boundaries can be 
stored efficiently to the computer memory, and (iii) if from there they can be retrieved and 
evaluated fast on demand. Next we will explain how these requirements can be achieved. 
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Recall from Section 5.3 and Figure [l|that TT E C is embedded into the equilateral 



triangle L^S"^) C by means of a linear map tt i-^ L{7t). In this natural representation of 
posterior distributions, tt = (ttq, tti, 712) is mapped to the point L{7i) = (^:^7ri + ^7r2,T 
whose Euclidean distance to the images L(1,0,0) = (0,0), L(0, 1,0) = (:^,0), L(0,0, 1 
1) (corners of the equilateral triangle L{S^)) of (1, 0, 0), (0, 1, 0), (0, 0, 1) are 



ro(7r) ^ ||L(1, 0, 0) - L(7r) \\ = \ll i^I + + ^1^2) 



ni-K) ^ ||L(0, 1, 0) - L(7r)|| = \l^ {-kI + Tri + ttoVT^) 



r2(7r) 4 ||L(0, 0, 1) - L(7r) \\ = j\ (vr^ + vrf + 



3 



respectively; in a more compact way. 
(6.1) ri(7r 



I + X] ^i^-t ) ' ^ = 0' 1' 2; 

V 0<j<k<2 / 



see Figure [6[d). Because the Euclidean distance of L{n) to the edges opposite to the corners 
L(l, 0, 0), L(0, 1, 0), L(0, 0, 1) are ttq, tti, and 7r2, respectively, the angles identified in Figure 
6]^a) can also be calculated easily by 

/3i(7r) = arcsin — ^ = and /?2(7r) = arcsin '^^ 



I (tTq + ttI + 7ro7r2) A/ f (tTq + ^rf + ttoVTs) 



or more compactly by 

r,\ /Q /' \ ^1+2 mod 3 "^{+2 mod 3 ■ n i r> 

(6.2) A(7r) = ^^^^^ = ^ z = 0,l,2. 

Recall that at any tt E S"^ one has to calculate /5j(7r) and rj(7r) only for i = argmini<j<3 hj{7r) 
and check if rj(7r) < gi{l3i{7i)) before raising an alarm. 

Unfortunately, an exact /closed-form representation r = gj{(3) of the critical boundary of 
the stopping region T^^\ j = 1,2 in terms of the local polar coordinates (/?, r) relative to 
the corner confined in F*^-'^ will almost never be available. Instead, only noisy observations 
(due to the discretization of the state-space S*^ and termination of the value iteration at 
some finite stage) of that relation can be obtained from the pairs (/?j('7r), rj(7r)) for every 
grid-point vr on the (approximate) critical boundary of F'--'^ for every j = 1, 2. Interpolation 
between those points will certainly give an approximation for r = gj{l3) for j = 1, 2, but this 
may waste a lot storage space and computational time during online evaluations, especially 
when the grid on 5*^ is fine. Instead, one can use some statistical smoothing technique to 
compress the data with minimum loss of information. 
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Let US suppose that observations r^^^), k = 1,...,N follow the model r^''^ = 

diiP^''^) + ^^^^ for every k = 1, . . . , N and that €^''\ k = 1, . . . , N are i.i.d. random variables 
with zero mean and some finite common variance. Because F^^^ is convex, the function 
P ^ gi{P) is concave, namely, fairly smooth. It may be plausible to approximate it by 
a cubic spline (twice continuously differentiable piecewise cubic polynomial). The unique 
curve f3 ^ ^i{f3) that has the minimum penalized sum of squared errors 



N 

(6.3) 5a (?i) ^ [^^'^ -Mf3^'^)T + A / [9m?d/3, 

k=i -^^ 



for any arbitrary but fixed smoothing parameter A > 0, among all twice-differentiable 
curves is known to exist and belong to the family of cubic splines whose break-points are at 
Pi, . . . , Pn] see, for example, de Boor [7], Green and Silverman [11], Ramsay and Silverman 
[T7j . This optimality property and the ability to control the smoothness continuously through 
A make cubic splines an attractive candidate for an approximate gi{-)- If the variation of the 
original curve P ^— gi{P) is moderate, then the number of break-points < K < N can be 
taken significantly less than the number of measurements A^, and there are 0(ii')-algorithms 



that find the cubic spline minimizing (6.3) with the given K break-points; see, for example. 
Green and Silverman [HI Section 2.3.3] for Reinsch algorithm. Other algorithms represent 
the solution ClS db basis-function expansion 



K+3 

i=i 



in terms of + 3 spline basis functions $i, . . . , ^k+s, and solve the minimization problem 



in (6.3) by finding the coefficients Ci, . . . ,ck+3 using multiple-regression; see Green and 
Silverman [HI Section 3.6], Ramsay and Silverman [TTJ Section 3.5 and Chapter 5]. Thus, 
the approximation ^i(-) of gi{-) can be stored to the computer memory for online use of 
the change-diagnosis algorithm by means of only + 3 numbers Ci, . . . , ck+3- The basis 
functions $i,$25 • • • are cubic splines with compact support and can be stored easily and 
evaluated fast online. 

All of the above ideas apply without affecting significantly the online performance of the 
diagnosis algorithm when the number of alternatives M after change is larger than two. For 
example, if M = 3, then C is embedded into a tetrahedron L{S^) C by a linear map 
7r L{tt) defined in Section 5.4 The Euclidean distance of L{ti) to the images L(l, 0, 0, 0), 



L(0, 1, 0, 0), L(0, 0, 1, 0), L(0, 0, 0, 1) of (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1) are given 
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by 







||L(1, 0,0,0) 


-m\\ = 




A_ 


11^(0,1,0,0) 


-L{7r)\\ = 


^2(71") 


A_ 


||L(0, 0,1,0) 


-L{^)\\ = 






||L(0, 0,0,1) 


-L{7r)\\ = 



{nf + 7r| + 7r| + nin2 + vriTTa + 7r27r3) 



Jo + + vr| + 7ro7r2 + vrovra + 712713) 



(7rg + 7r^ + 7r| + 7ro7ri + 7ro7r3 + 711713) 



[n^ + TTi + 7r| + TToTTi + 7ro7r2 + 7ri7r2), 



respectively; or more compactly 



(6.4) 



r,; 7r 



\ 



0<j<fc<3 



0,1,2,3; 



see Figure [?[ Because the Euclidean distance of L{tx) to the faces of the tetrahedron oppo- 
site to the corners L(l, 0,0,0), L(0, 1,0,0), L(0, 0,1,0), L(0, 0,0,1) are 7ro, 7ri, 7r2, and 773, 
respectively, the distance rj(7r) and two arbitrary but fixed angles, /9i(7r) = (Ai(^), A2(7r)), 
out of three angles defined by 



(6.5) 



7r,- 



7r,- 



arcsm 



arcsm 



r,- 7r 



0<fc<f<3 



0<j<3, jj^i 



form the local polar coordinates (/5j(7r), rj(7r)) with respect to the corner of the simplex 
confined in r^*\ < i < 3 and determine L{7i) uniquely. 

The critical boundary between stopping region T^^\ 1 < < 3 and the continuation region 
can be represented by some concave surface r = gi{(3) in terms of the same local polar 
coordinate system (/5, r) just defined above in the vicinity of r*^*\ where P = {Pi, P2) is now a 
vector. If (/3W,rW), k = l,...,N are the pairs (/?i(7r), ri(7r)) evaluated at grid-points 7r on 
the approximate boundary of T^^\ then one can fit a thin plane spline ^i(-), which is twice 
continuously differentiable and minimizes the penalized sum of squared errors 



N 



fe=i 



l<i,J<2 



K2 \dpidpj 



among all twice-differentiable curves on for every arbitrary but fixed smoothing param- 



eter A > 0. As before, 'gi{P) = J2f=i^ '^j^jiP) admits a basis-function expansion, and the 
coefficients Ci, . . . ,ck+3 can be found by using multiple-regression and stored in the com- 
puter memory for the online use of change-diagnosis algorithms. See Green and Silverman 
[TTl Chapter 7] for statistical data smoothing in three and higher dimensional Euclidean 



spaces by using thin plate splines. The similarity of the local polar coordinates (6.1), (6.2) 



BAYESIAN SEQUENTIAL CHANGE DIAGNOSIS 



23 



ij image of (0,0,0, 1) 



(0,0 

image of 
(1,0,0,0) 




(tTq + TTj + TTj + TToTTl + 7^0712 + 7ri7r2) 



I (tT,^ + TI"! + I"! + 1'07I"2 + T^OTTS + 7I"2l'3) 



image of tt = (ttq, tti, 7r2, tts) 

2' 

image of (0, 1,0,0) 



Figure 7. Polar coordinates of vr (after transformation by L; see Section 5.4) 
in terms of its Cartesian coordinates. 



for M = 2 and (6.4), (6.5) for M = 3 suggest that for general M > 2 and for a suitable 
constant cm > 



0<j<k<M 



^ TTj-TTfe I , i = 0, 1, . . . , M 



and M — 1 arbitrary but fixed angles, Pi^n) = . . . , Pi^M-i), out of M angles defined by 



arcsm 



arcsm 



n, 77 



Cm (-VTi + J2 



0<k<e<M 



< J < M, ^ 



form a local polar coordinate system (/3j(7r), rj(7r)) with respect to the corner of the simplex 
5*^ C confined in stopping region T^^\ 1 < i < M after a suitable linear transformation 
L into R^'^-\ 
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Appendix A. Proofs 



A.l. Proof of Proposition 3.1. Note that since {r > n} G Tn for every n > 0, we have 



E [(r-^)+] =E 



X] l{e<n<r} 



.n=0 



^E[1|,>„}P(^^ <n| j;)] =E 



71=0 



r-1 



,n=0 



Moreover, for every j G A^, we have {r = n, d = j} G Tn-, and E [l{rf=j}l{T-<e}.] equals 



N 



n=0 



n=0 



n=0 



lim E 



N 



(0) 



n=0 



Jiin E [l|,<jv,d=,}nf ] = E [l{r<oc,d=i}^?^] 



because of the monotone convergence theorem and that \imN^oo{T < N} = Uj^^jr < n} 
{t < oo}; see, for example, Ross [IB]. Similarly, E [l|^=j^^=j}l|5i<^<oo}] equals 



{T=n,d=j}'^ {9<n,fj,=i} 



] = Ee [l{.=„,.=,}n«] = E [1|.<^,,=,}H«] 



n=0 



n=0 



for every i G TVI. Plugging these expressions into (2.2) completes the proof. 



□ 



A.2. Proof of Proposition 3.2 , Parts (a) and (b). Fix any A = {(Xi, . . . , X„) G 5} G J'n 



for some Borel B C -E". Then (2.1) implies that 



(A.i: 



F{A) = / m((ixi) ■ ■ ■ m((ix„)a„(xi, . . . , x„) 



where . . . , x„) = Xl^^lo . . . , and 



;i-Po)(i-prn/' 



o(a;z, 



i = 0, 



A:-l 



n Mx,) + (1 - pow. - p)'"' n n /^(^^•)' ^ ^ 
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Hence, a„(xi, . . . , x„) is the joint probability density function of Xi, . . . , X„ with respect to 
the measure m{dxi) ■ ■ ■ m{dxn)- Now for i G 

■m{dxi) ■ ■ ■ m{dxn) a;^*^(xi, . . . , x„) 



m{dxi) ■ ■ ■ m{dxn) (y.n{xi, . . . , x„) 



B 



«n(a;i, . . . , X„) J A «r!,(Xi, . . . , X„) 



Hence, 



(A.2) ni;)^ °-'ffi--5-' ..gA<. and ^ °" 'g- ' ' f 

a„(Ai, . . . , A„j an{Xi, . . . ,Xn) 

since ^^o-nn'' = 1- Similar considerations also give 



„(o)/ 



«n(Xi, . . . ,Xn) -^J^ 



A; = 0, 



A;, /i = z I j;} = < 



a„(Xi,...,X„) A A ±± 



(1 -po)(l -p)^ V, 

(Xi,...,Xri) fj- 



k>n + l. 



Observe that for k > n + 1, 



In particular, = n + 1, jj, = i \ J^n} = H^f^ pui, and P{5 < n + 1, /i = i | equals 

¥{6 <n,fi = i\Tn} + ^{0 = n + l,iJ = i\J^n} = n« + n^^^^pi/^. 
Note also that P{6' > n + 1 1 JF„} equals 

oo M oo 
k=n+2 i=l k=n+2 

r(0) I -r- 1 _ iir/j ^ ^ , i i -r i _ rrWn ^^ ^ rrW 



Thus, E[n^"ji I = P{^ > n + 1 1 = n^"'(l -p)< nr, and 

E[n^ii I = p{^ < n + 1, /i = z I = n« + pz/, > n», leM. 



Hence, {Yin\j^n]n>o is supermartingale, and {H^i*'', jF„}„>o, z G are submartingales. 
For the proof of Part (c), note first that 

[a^nKxi, ...,Xn) . . . /j(x„+i), i G M, 

(1 -p)a^^\xi, . . . ,x„)/o(x„+i), i = 0. 



(A.3) al;|i(xi,...,x„+i) 



Substituting these expressions after writing H^+i, i G {0} U M by using (A.2), and then 
dividing both numerator and denominator by an{Xi, . . . ,X„) give (3.3). 
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Next, we find the conditional distribution of given Tn for n > 0. \i g : E ^ M+ is a 
nonnegative function and A = {(Xi, . . . , X„) E B} E Tn-, then E[(yf(X„+i) | JF^JtiP equals 



5f(X„+i)(iP = / 5((x„+i)an+i(2:i, • • • ■ ■ ■m((ix„+i) 



B 



g {Xn+i) 7 T — m{dxn+i , 



^(2;n+i) 7T7 TT^ m(da;„+i 



. . . , Xn) m{dx\) ■ ■ ■ m{dxn) 
dF. 



Therefore, we have 

(A.4) E[g{Xn+i) \Tn]= [ g{ 

J E 



^> ^^^^> 



g{x)D(Jln, x)'m{dx), 



where the second equality follows from (A. 2) after substituting (A. 3) into previous equality, 
and the mapping D was defined by (3.1 ). Then for every nonnegative function / : ^- ^ ™ 



( |3.3D and ( |A.4D imply that 



f 



-Do(n„, Xn+l) 
-D(n„, Xn+l) 



D(n„,x„+i) 



•J n 



(T/)(n„) 



in terms of the operator T defined by (3.2), and E[/(n„+i)|jF„] = E[/(n„+i) | n„]. Therefore, 
the process {n„, JF„; n > 0} is Markov, and the proof of part (c) is completed. □ 



A. 3. Proofs of Lemmas 4.1 and 4.2 , Before proving the lemmas, we state Definition A.l 



Theorem A. 2, and Lemma A. 3 from Chow et al. [51 pp. 62-69] for ease of reference. 



Definition A.l. A collection {^t)t&T of random variables is called directed-upwards if, for 

every u,v E T, there exists t E T such that C,t > 

theorem A. 2. If a collection {C,t)teT of random variables is directed-upwards, then for every 
to £ T, there exists a non- decreasing sequence {^t„)n>o the collection {C,t)t£T such that 

ess sup C,t = lim | > almost surely. 
lemma A. 3. For every n > 0. the collection {E[Yt- \ JF„] | r E C„} is directed-upwards. 



Proof of Lemma \4 1\ To prove the lemma, we establish two inequalities. Note that 7„ > 
E[Ft- I JF„] for all r G C„ by definition. So, taking expectations, we obtain E7„ > sup^g^*^ KY^ = 
—Vn. For the reverse direction, by Theorem |A.2| and Lemma A. 3 there exists a sequence of 
stopping times (rfc)fc>i C C„ such that Yn < ^[Yn^ \ Tn] t 7n as oo. So, by the monotone 
convergence theorem, we have E7„ = E [lim^^oo ^[^Tk I ^n]] = hnifc^oo ^^'^-^ < —Vn. Proof of 
the equations —V^ = E7^, < n < N is similar. □ 
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Proof of Lemma We have 7„ < max{y„, E[7„+i | J^n]}, because for every fixed r G C„, 
the expectation K[Yr \ J-'n] equals 

IE[lVl{r=n} + ^rV(n+l)l{T>n+l} \ ^n] = ^nl{T=n} + l{r>n+l}IE[ E[yT-v(n+l) | ^ n+l] \ ^n] 

< Ynl{r=n} + l{T>n+l}E[7n+l | ^n] < niax{F„, E[7„+i | J^]}. 

For the reverse direction, note that 7n > l^n = E[l^ | JF„] by definition. Since 7^+1 = 
ess sup^g(-7^^^ E[yr I J?-'n+i], by Theorem A. 2 and Lemma A. 3 there exists a sequence of stop- 



t 7„+i as /c ^ 00. Since Cn+i C C„ 



and Lemma 

ping times {Tk)k>i C such that < E[K^ | J'n+i 
for all n > 0, we have 7„ > E[yT-fc | ^n] = ^[^[Y^Tk I -^n+i] | ^n] for all > L Taking the limit 
as ^ 00 and applying the monotone convergence theorem, we have 7„ > E[7„_,.i {J-'n]- 
Therefore, 7„ > max{y„, E[7„+i | By a similar argument, we can establish the other 

equations of Lemma |4.2[ □ 



A. 4. Proof of Lemma 4.3, Because {C^)j\f>n is increasing for every n > 0, the sequence 
{ln)N>n is increasing for every n > and has a limit. Set 7^ = limTv^oo 7,^^, n > 0. 
Because Jn+i — '^n+i and Yn+i is integrable, taking limits in 7^ = max{l^, E[7^;^ l-^n]}; 



see Lemma 4.2, and monotone convergence give 7^ = maxjl^, E[7^^^ | J^n]} for every n > 0. 
Particularly, (7^)„>o is an F-supermartingale. 

Obviously, 7^ < 7„ for every n > 0. To prove the reverse inequality, it is enough to show 
that 7^ > E[yr I ^n] for every r G C„. Take any r G C„. Then for every F E J-'n and m > n 



F JFn{T=n} J Fn{T>n} J Fn{T=n} J Fn{T>n} 

[ lrd^+ [ ln+l^>---> [ 7rdf+ [ 

J Fr\^n<T<n+l} J FnW>n+l} J FnVn<T<m} J F( 



' Fn{n<T<n+l} J Fn{T>n+l} J Fn{n<T<m} J Fn{T>m} 

where the inequalities follow from F-supermartingale property of the process (7.^)ri>o- Be- 
cause 7^ > Yfc for every A; > 0, we have 7:^- > l^r, and for every m > n 

J F J Fn{n<T<m} J Fn{T>m} J Fn{n<T<m} J Fn{T>m} 

Since Yr = —Y~ is integrable and r < 00 a.s., we have limm^oo lFn{n<T<m} '^rdf = JpYrdF 
by dominated convergence, and the proof will be completed if lim^^^ I[r>m] (Tm)"'^^ = 0- 
However, since 7^ > Y^, we have (7^)" < /{r>m}(^m)~'^^ ^^^^ than or equal to 

/ Y^dF < mrfP+ ||/i||P{r > ra} < Erl|^>„} + ||/i||P{r > m}, 

J {T>m} J {r>m} 

A 



where \\h\\ — sup^g^M |/i(7r)|. Since h{-) is bounded, Er < 00 and P{r < 00} = 1, the right 
hand side of the last inequality converges to zero as n — >^ 00. □ 
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A. 5. Proof of Lemma 4.4 , Fix any > 1. The equality holds trivially for n = N. On 
the one hand, the definition of the random variable 7]^ in (4.2) implies that 



= ess supE[y^ I J^n] =E[Yn\ Tn] = Yj 



N 



because = {N}. On the other hand, by the definition of the operator M in (4.3) we have 
M^h = K and 



N~l 



N-1 



(0)^ 



/i)(no = -c ^(1 - nf ) - /i(no = v^; 



k=0 



k=0 



thanks to (3.5). Therefore, ( |4.4 ) holds for n = N. Now suppose that (4.4) is true for some 



n > 1. Then ^y^-i = niax{y„_i, E[7^ | J-'n-i]} equals 



n-2 



max <^ -c 5^(1 - nf ) - /i(n„_i), E 



k=0 
n-2 



n-1 



cE(i-n[°))-(M^-"/.)(n„,: 

A:=0 

^(1 - - min {/i(n„_o, c(i - ni°i,) + (T(M^-"/^))(n„_o 

fe=0 

n-2 n-2 

^(1 - nf ) - (M(M^-"/i))(n„_i) = -c - ) - (M^-("-i)/i)(n„_i)- 



fc=0 



fc=0 



By induction, the equality holds for all < n < A^. 



□ 



A. 6. Proof of Lemma 4.5 Applying Lemma 4.4 for n = yields part (a) since 
(A.5) ^„^ = -E7o^ = -^„^ = (M^/i)(no), Ar>0. 



By Lemma 4.3 , 7n = limAr^oo 7n ; ^ind so Ki = limAr^oo by Lemma 4.1 and the dominated 



convergence. Since the left-hand side of (A.5) converges to Vq as A^ — > 00, the limit of the 

I^/i)(no), which proves part (b). □ 



right-hand side as A^ — 00 exists and Vq = limAr_^oo( 
A. 7. Proof of Lemma 4.8 Given vr, tt' G A G [0, 1], and A' = 1 - A, we have 



A(T^)(7r) + A'(T^)(7r') = A / m{dx) D{7r,x)g 



Dill, x) ' ' Din, x) 



+ \' m{dx)D{Ti' ,x)g . 

Je \D{n',x 



Do{n',x) DM{Tr',x) 



' D{71',X) 

XD{7r,x) 



m(dx) \\D(n,x) + X'D(tt' ,x)] . , ^, , , ^, 



;9 



Do{iT,x) Dm{it,x) 



D(71,X) 



X'D(n',x) /DoiiT^x) 



XD{tc, x) + X'D{n\ xy \ D{tt', x) ' ' ' ' ' 0(71', x) 
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+ 



X'DU'x) 



1 



\D{ix, x) + \'D{'k\ x) XD{7t, x) + X'D{7t', x) 
is a convex combination, we continue the chain of inequahties to obtain 



A(T^)(7r) + A'(T^)(7r') < / m{dx) [XD{tt, x) + X' D{tt' , x)] 



X 9 



XDo{n,x) + X'Do{7t',x) 



XDM{n,x) + X'DM{7r',x) 



XD{7i,x) + X'D{7r',x) XD{tt,x) + X'D{n',x) 

DoiX-K + A'tt', x) DMiXn + X'n', x) 
■' D{Xn + X'n',x) 



= (T^7)(A7r + AV). 

Note that the second to last equahty follows from the fact that each of Dq, . . . , Dm, D is 
linear in its first argument. So, we have established that T^f is concave. □ 



A. 8. Proof of Proposition 4.9, Since /i(vr) = min^g^vj vTjajj is concave, and since 



the pointwise minimum of two concave functions is concave, by Lemma 4^ the function 
(M/)(7r) = min{/i(7r), c(l — ttq) + (T/)(7r)} is concave for every bounded concave / : 
5*^ 1-^ M. Therefore, M/i, M^/i, . . . are concave, and Vq^Vq,... are concave by Lemma 
4.5 a). This proves part (a). For part (b), note that Lemma 4.5[ b) implies that Vo(7r) = 
limAr^oo(M^/;,)(7r) for every vr G S*^^; thus, Vo(-) is concave on 5*^^. □ 



A. 9. Proof of Proposition 4.10, The inequality — Vo(7r) > — VQ'^(7r) for every vr G S^^ and 
> 1 is obvious. Let us prove the second. Fix A^ > 1, vr G S**^, and any e > 0. Since 

> -Vq{'k) = sup E^Yr > E^Yo > - \\h\\ > -oo 

reCo 

is finite, there exists some stopping time G Cq such that 



(A.6) 



-Vo(n) -e< E^K = 



Te-l 



-cE(i-nr)-Mn.. 



k=0 



Observe that A A^ G and 



K'^(vr) > E^Yr.N > E^ 



Te-l 



k=0 



- \\h\\F^{n > N} 



(A.7) 



> -VnM - e 



N 



E^T^. 
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The last inequality follows by the Markov inequality applied to PttIte > A^} and since is 
e-optimal for Vq. Next, we will bound E^r^ from above by using ( A.6[ ): 



fc=0 



< E. 



(Oh 
k > 



k=0 



-cE^T, + cE^ J2 ni°^ < -cKts + cE^ Yl = -cKts + cJ2 ^-n^ 



(0) 



k=0 



k=0 



k=0 



Rearrangement after using the inequality E,rn^°'* < (1 — p)*"' of Proposition |3^a) gives 



E.r, < - [Vo{n) + e] + - < ^ + -. 

c pep 



Now using this bound on E^r^ in (A. 7) we have 



N 



c 



+ e 1 
+ - 
p 



However, e was arbitrary, so taking the limit as e | we obtain the desired bound. 



□ 



A. 10. Proof of Proposition 



^^QTTjajj, which is continuous in tt G 5*^^. Suppose that Vq'^ : S**^ t— > IR+ is continuous for 

(MO(7r) = min{M7r),c(l-7ro) + (TO(7r)}, 
Do{n,x) 



4.11 



Recall that V^iir) 



mm,g;K 



some A^ > 0. Then by (A. 5) 
(A.8) V^+\ti) = (M^+i/i)( 



where (see (3.2)) 

(TC)(vr 



(A.9) 



Note that 



m{dx)D{'K,x)V^ 



E 



the mapping vr i-^ D(7r, x) is continuous for every x & E, 

for every x ^ E such that D{tt,x) > (these are the x-values that matter in 



-Do(7r,a:) 



^' D{tt,x) ' • • • ' D(iT,x) ' 



are 



the defining integral of (TV^ )(7r) above), the coordinates 
continuous, 

• since Vo^(-) is continuous on 5*^^ by the induction hypothesis, the integrand in (A.9) 
is continuous in tt for every fixed x E E such that D{7i,x) > 0, 

• since < Vq^(-) < \\h\\, the same nonnegative integrand is bounded from above by 
the integrable function 2\\h\\ J^fLo fii^) every vr G S^'^ , 

• then the mapping vr h-i> (TVQ'^)(7r) is continuous by dominated convergence, 

• and finally, since h{n) and c(l — ttq) + (TVj^)(7r) are continuous, (A.8) implies that 
the mapping vr i-^ VQ'^^^(7r) is continuous. 

Hence, continuity holds for every A^ > by induction, and this completes the proof. □ 
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A. 11. Proof of Corollary 4.12 The function Vo(7r) on the compact space 5**^ is the limit 
of the sequence {Vo'^(7r)}7v>o of continuous functions, uniformly in vr G S^^ by Proposition 
4.101 Therefore, it is continuous. □ 



A. 12. Proof of Theorem 4.13, By Lemmas 4.4 and 4.1 we have that {Vq)n>o is a non- 



increasing sequence of functions, bounded from above by the function h. Since h{-) and 

luous and since Vq'^(-),A^ > are continuous on S*^^ by Proposi- 
{tt G S^'^ I Vo{tt) = hijx) = hj^Tv)} is a closed subset of S^-' for each 



tion 



4.11 



the set F 



N 



N >0 and j e M. 

Fix j eM. Then ^0^+^ 
T%\-^ C fS^^ for every > 0. Hence, (F 



tt) = h{T[) = /ij(vr) implies Vq 



Ni 



IT 



N )N>0 



h{7T) = /ij(vr); and therefore. 



is a non-increasing sequence of closed subsets 



of S^'. Clearly, F^ = Ufii A^ > and (F 



Ar)Ar>o is also a non-increasing sequence of 
closed subsets of 5*^^. Moreover, since \ Vq by Proposition 4.10, the limit of the non- 
increasing sequence (FAr)Ar>o is F; i.e., n?^=i F^ = F. Similarly, n^=i T^ff = J e M. 

Given vr G , if the inequality /ij(vr) < min{/i(7r), c(l — ttq)} holds, then /ij(vr) < hiji), 
which implies that hj{TT) = /^(vr). Also, 

hjin) < min{/i(7r),c(l - ttq) + {TVo){n)} = Voin). 

This follows from the fact that Vq > implies TVq > and from the optimality equation of 
Proposition 4.6 But, since Vo<hon S^, we have Vo{7i) = hj{7i) = h{Tx) and thus vr G F^^^. 
As a corollary, since hj{ej) = < min{/i(ej), c}, we have ej G F^-'-'. 



In order to prove the convexity of T^^\ take tt, tt' G T^^' and show that Att + (1 - A)7r' G F 
for every A G [0, 1]. Since Vq'^(-) is concave by Proposition 4.9, we have 



(i) 



^(i) 

N 



XVo^in) + (1 - XWo'^in') < (Att + (1 - A)7r') < h{Xn + (1 - A)7r') < hjiXn + (1 - A)7r') 



Xh.in) + (1 - X)h,{n') = AVo^(7r) + (1 - X)V,''{7r'). 



Therefore, since ¥^{71) < h{TT),TT G S , we have 

Vq^IXtc + (1 - A)7r') = h{Xn + (1 - A)7r') = hj{Xn + (1 - A)7r') 



and Att + (1 — A)7r' G Fa? fl {tt G S^^ \ h{Tx) = hj{TT)} = F^-*. Hence, T^^' is convex. Since an 
intersection of convex sets is again convex, F'^-'^ = n?^=i is convex. 

Thus, we have shown that F = IJi!li r^*"* is the union of M non-empty closed convex subsets 
of S^. Finally, consider 7r(A) = Acq + (1 - X)ej for A G (0, ^^]. Note that c > and 
c^oj > imply that the interval (0, ^^^_^_^ ] is non-empty. The inequality A < implies that 



aoj+c 



c(l - A) > Xaoj = hj{7T{X)). Hence, h{7T{X)) < hj{7r{X)) < c(l - A) < c(l - A) + (T1/o)(vr(A)) 
and so Vo(7r(A)) = h{'K{X)) by Proposition 4.6 Therefore, F 3 7r(A) ^ {ci, . . . , Cm}. 



□ 



A. 13. Proof of Lemma 



4.14 



For every n > 0, the limit limAr^oo 7.;^^ exists a.s. by Lemma 



4.3 So, fix n and take the limit as A^ 



Lemma[4.5[b) to obtain the result. 



00 of the expression in Lemma 4A Then apply 

□ 
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A. 14. Proof of Theorem 4.15[ Let us prove part (a) first. Note that 

a = mf{n > | n„ G T} = mf{n > | K)(n„) = h{Un)} = inf{n > | 7„ = Y^}. 
The second equahty foUows from the definition of F and the last equality follows from 



Lemma 4.14 and the definition of 1^ (3.5). Now, fix n and recall from Lemma 4.2 that 
7„ = max{F„,E[7„+i|J^„]}. Then 7^ = E[7„+i|J^„] on {a > n}. So, 

IE[7(„+1)A(7 I ^n] = E[7^1{^<„} I J^n] + E[7„+il{<^>„} I J^n] 

= la'^{a<n} + l{(7>n}E[7n+i | JF„] = 7(jl{cr<n} + ln'^{a>n} = InAa- 

This establishes the martingale property of the stopped process {■jnAa, ^n}n>o- 



To prove part (b), we use part (a) and Lemma 4.1 



to write 



-Vo= sup EYr 

reCo 



70 = E[jnAa] = E[r,l|,<„}] + E[7„l|, 



Since 1^ = — X]fc=o '^(^ ~ '^k'^) ~ h{Iln) < for every n, we can use Fatou's Lemma after 
taking lim sup„_^o^ of both sides to obtain 



(A.IO) 



-1^0 <E[r„l|<,<oo}] +E 



(lim sup7n)l{<7 



{(7=00} 



Since limsup„^^ 7„ < limsup„^^ - XlLo ^(1 - H 



n^oo 



00 by Remark 3.3, and —Vq > 



—h > —00, the inequality (A.IO) implies that ¥{a = 00} = 0. Therefore, the same inequality 
becomes — Vq = sup^EF,- < EY„. To show that a is optimal, we must prove that a G Cq. 
Since a < 00 a.s., it is enough to show EY~ < 00, which is equivalent to showing that 



Ecr < 00 by the discussion before equation (3.7) 



However, since EY^ > —Vq > —00, we also have Ecr < 00. Indeed, 



-00 < EF^ = E 



cr-l 



5^c(i-ni°))-Mnj 



k=0 



< -cEa + cE 



En 



-cEa + c'^EUf < -cEa + c^^il - p)'' = -cEcr + 



k=0 
C 



k=0 



k=0 



V 



implies Ecr < 00. Here, the last inequality follows from Proposition 3.2 a). This completes 
the proofs of parts (b) and (c). □ 
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